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I Abstract 

Two classical sources of imprecision in static analysis by abstract inter- 
pretation are widening and merge operations. Merge operations can be done 
away by distinguishing paths, as in trace partitioning, at the expense of enu- 
merating an exponential number of paths. 

In this article, we describe how to avoid such systematic exploration by 
focusing on a single path at a time, designated by SMT-solving. Our method 
combines well with acceleration techniques, thus doing away with widenings 
^ . as well in some cases. We illustrate it over the well-known domain of convex 

polyhedra. 
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, 1 Introduction 

m 

^ , Program analysis aims at automatically checking that programs fit their specifica- 

CN I tions, explicit or not — e.g. "the program does not crash" is implicit. Program 

^ ' analysis is impossible unless at least one of the following holds: it is unsound 

(some violations of the specification are not detected), incomplete (some correct 
programs are rejected because spurious violations are detected), or the state space 
is finite (and not too large, so as to be enumerated explicitly or implicitly). Ab- 
stract interpretation is sound, but incomplete: it over-approximates the set of be- 
haviours of the analysed program; if the over-approximated set contains incorrect 
d I behaviours that do not exist in the concrete program, then false alarms are pro- 

duced. A central question in abstract interpretation is to reduce the number of false 
alarms, while keeping memory and time costs reasonable | 8J. 

Our contribution is a method leveraging the improvements in SMT-solving to 
increase the precision of invariant generation by abstract fixpoint iterations. On 
practical examples from the literature and industry, it performs better than previ- 
ous generic technique and is less "ad-hoc" than syntactic heuristics found in some 
pragmatic analyzers. 
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Listing 1: C implementation of y = sin{x)/x - 1, with the -0.01 < x < 0.01 
range implemented using a Taylor expansion around zero in order to avoid loss of 
precision and division by zero as sin(x) - x — > 0. 

if (x >= 0) { xabs = x; } else { xabs = -x ; } 
if (xabs >= 0.01) { 

y = sin(x) / x - 1; 
} else { 

xsq = x*x; y = xsq*(-l/6. + xsq/120.); 

} 



The first source of imprecision in abstract interpretation is the choice of the 
set of properties represented inside the analyser (the abstract domain). Obviously, 
if the property to be proved cannot be reflected in the abstract domain (e.g. we 
wish to prove a numerical relation but our abstract domain only considers Boolean 
variables), then the analysis cannot prove it. 

In order to prove that there cannot be a division by zero in the first branch 
of the second if-then-else of Listing [T] one would need the non-convex property 
that X > 0.01 V X < -0.01. An analysis representing the invariant at that point 
in a domain of convex properties (intervals, polyhedra, etc.) will fail to prove the 
absence of division by zero (incompleteness). 

Obviously, we could represent such properties using disjunctions of convex 
polyhedra, but this leads to combinatorial explosion as the number of polyhedra 
grows: at some point heuristics are needed for merging polyhedra in order to limit 
their number; it is also unclear how to obtain good widening operators on such 
domains. The same expressive power can alternatively be obtained by considering 
all program paths separately ("merge over all paths") and analysing them indepen- 
dently of each other. In order to avoid combinatorial explosion, the trace partition- 
ing approach 1 36 1 applies merging heuristics. In contrast, our method relies on the 
power of modem SMT-solving techniques. 

The second source of imprecision is the use of widening operators \d4l . When 
analysing loops, static analysis by abstract interpretation attempts to obtain an in- 
ductive invariant by computing an increasing sequence Xi,X2, ■ ■ ■ of sets of states, 
which are supersets of the sets of states reachable in at most 1,2,... iterations. In 
order to enforce convergence within finite time, the most common method is to 
use a widening operator, which extrapolates the first iterates of the sequence to a 
candidate limit. Optional narrowing iterations may regain some precision lost by 
widening. 

Illustrating Example Consider Listing |2j a simplification of a fragment of an 
actual industrial reactive program: indexing of a circular bufi'er used only at certain 
iterations of the main loop of the program, chosen non-deterministically. If the 
non-deterministic choice nondet() is replaced by true, analysis with widening 
and nanowing finds [0,99]. Unfortunately, the "narrowing" trick is brittle, and on 
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Listing 2: Circular buffer indexing 



int X = 0; 
while (true) { 
if (nondetO) { 
X = x + 1; 

if (x >= 100) X = 0; 
M 

Listing [21 widening yields [0, +oo), and this is not improved by narrowing ! In 
contrast, our semantically-based method would compute the [0,99] invariant on 
this example by first focusing on the following path inside the loop: 

Listing 3: Example focus path 
assume ( nondet 0) ; x = x + 1; assume(x < 100); 

If we wrap this path inside a loop, then the least inductive invariant is [0,99]. We 
then check that this invariant is inductive for the original loop. 

This is the basic idea of our method: it performs fixpoint iterations by focusing 
temporarily on certain paths in the program. In order to obtain the next path, it 
performs bounded model checking using SMT-solving. 



2 Background and Notations in Abstract Interpretation 




(a) With original variables (b) SSA version, x = 0(ei, e2> ••■) denotes 

a SSA 0-node: x takes value ei if control 
flows from the first incoming edge, 62 from 
the second. . . 



Figure 1 : Control flow graph corresponding to listing [2] 

We consider programs defined by a control flow graph: a set P of control points, 
for each control point p e P a. (possibly empty) set Ip of initial values, a set £ c 
P X P of directed edges, and the semantics Tg : !P(S) — > P(L) of each edge e e E 

'On this example, it is possible to compute the [0,99] invariant by so called "widening up-to" 
1281 Sec. 3.2], or with "thresholds" (8): essentially, the analyser notices syntactically the comparison 
X < 100 and concludes that 99 is a "good value" for x, so instead of widening directly to +00, it first 
tries 99. This method only works if the interesting value is a syntactic constant. 
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where is the set of possible values of the tuple of program variables. Tg thus 
maps a set of states before the transition expressed by edge e to the set of states 
after the transition. 

To each control point p e P we attach a set Xp c X of reachable values of 
the tuple of program variables at program point p. The concrete semantics of 
the program is the least solution of a system of semantic equations lfT4ll : Xp = 

U U(p',p)e£ ''"(/)' ,p)(^p')- 

Abstract interpretation replaces the concrete sets of states in !P(S) by elements 
of an abstract domain D. In lieu of applying exact operations r to sets of concrete 
program states, we apply abstract counterparts r'H An abstraction of a concrete 
operation t is deemed to be correct if it never "forgets" states: 



We also assume an "abstract union" operation U, such that X \J Y Q X UY . For 
instance, S can be Q", D can be the set of convex polyhedra and U the convex hull 
operation ||T7lfT7l[3l. 

In order to find an inductive invariant, one solves a system of abstract semantic 
inequalities: 



Since the are correct abstractions, it follows that any solution of such a system 
defines an inductive invariant; one wishes to obtain one that is as strong as possible 
("strong" meaning "small with respect to c"), or at least sufficiently strong as to 
imply the desired properties. 

Assuming that all functions t\ are monotonic with respect to c, and that U is 
the least upper bound operation in D with respect to c, one obtains a system of 
monotonic abstract equations: Xp - Ip U U(p',p)e£ ''"(p/ If has no 

infinite ascending sequences {di Q d2 Q ■ ■ ■ with d\,d2, - • • e D), then one can 
solve such a system by iteratively replacing the contents of the variable on the left 
hand side by the value of the right hand side, until a fixed point is reached. The 
order in which equations are iterated does not change the final result. 

Many interesting abstract domains, including that of convex polyhedra, have 
infinite ascending sequences. One then classically uses an extrapolation operator 
known as widening and denoted by V in order to enforce convergence within finite 
time. The iterations then follow the "upward iteration scheme": 



V (p',p)e£ ; 

where the contents of the left hand side gets replaced by the value of the right hand 
side. The convergence property is that any sequence m„ of elements of D of the 

^Many presentations of abstract interpretation distinguisii the abstract element 6 £) from the 
set of states y{x^) it represents. We opted not to, for the sake of brevity. 



VX e D t(X) c t\X) 



(1) 




(2) 




(3) 
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form u„+i = Un^ v,i, where v„ is another sequence, is stationary fl4\ . It is sufficient 
to appiy widening oniy at a set of program controi nodes Pw such that aii cycies in 
the controi ffow graph are cut. Then, through a process of chaotic iterations ifTSl 
Def. 4.1.2.0.5, p. 127], one converges within finite time to an inductive invariant 
satisfying Rel.|2l 

Once an inductive invariant is found, it is possible to improve it by iterating 
the if/^ function defined as F = ^H^), noting X = {Xp)p(ip and Y = {Yp)p^p, with 

Yp = IpU U(p',p)e£ '^\p, p)(Xp'). If X is an inductive invariant, then for any k, i//^''{X) 
is also an invariant. This technique is an instance of narrowing iterations, which 
may help recover some of the imprecision induced by widening |[T4l §4]. 



Algorithm 1 Classical Algorithm 



1 


A ^0; 




2 


for all p e P such that Ipi^d) do 




3 


A <- A U {/?} 




4 


end for; > Initialise A to the set of all non empty initial nodes 


5 


while A is not empty do 


> Fixpoint Iteration 


6 


Choose p\ e A 




7 


A^A\{pi\ 




8 


for all outgoing edge (e) from p\ do 




9 


Let p2 be the destination of e : 




10 


if p2 e Pw then 




11 


Xtemp ^ -^p2 ^ O^Pl '-' '^e(Xpj )) 


> Widening node; 


12 


else 




13 


Xfemp * Xp2 U Tg(Xp^} , 




14 


end if 




15 


if Xtemp 't Xp, then 


> The value must be updated 


16 


Xp2 ^ Xfgffip, 




17 


A ^ A U {p2}; 




18 


end if 




19 


end for; 




20 


end while; 


> End of Iteration 


21 


Possibly narrow 




22 


return all Xp.?>; 





A naive implementation of the upward iteration scheme described above is to 
maintain a work-list of program points p such that Xp has recently been updated 
and replaced by a strictly larger value (with respect to c), pick and remove the 
foremost member p, apply the corresponding rule Xp := . . . , and insert into the 
work-list all p' such that {p,p') € E (This algorithm is formally described in Al- 
gorithm [T]l. 
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Example of Section[T](Cont'd) Figure [T(a)]gives the control flow graph obtained 



by compilation of Listing |2] Node p2 is the unique widening node. 

The classical algorithm (with the interval abstract domain) performs on this 
control flow graph of the following iterations : 

• Initialisation : Xp, «— (-00, +00), Xp^ «— Xp^ «— Xp^ <— 0. 

• Step 1: Xp, <— [0,0], then the transition to 773 is enabled, Xp^ «— [1, 1], then 
the return edge to p2 gives the new point x = \ io Xp^ , the new polyhedron 
is then Xp^_ = [0, 1] after performing the convex hull. Widening gives the 
polyhedron Xp^ = [0, 00). 

(The widening operator on intervals is defined as [x/, x^] v[x^', x^] = [x"/, x'V] 
where x"/ = x/ if x/ = xj else -00, and x'V = x^ if x^ = x'^ else +00.) 

• Step 2: Xp^ becomes [l,+oo). The second transition from 773 to p2 is thus 
enabled, and the back edge to p2 gives the point x = to Xp^ . At the end of 
step 2 the convergence is reached. 

• If we perform a narrowing sequence, there is no gain of precision because of 
the simple loop over the control point p2. 



3 Our Method 

We have seen two examples of programs where classical polyhedral analysis fails 
to compute good invariants. How could we improve on these results? 

• In order to get rid of the imprecision in Listing [T] one could "explode" the 
control-flow graph: in lieu of a sequence of n if-then-else, with n merge 
nodes with 2 input edges, one could distinguish the 2" program paths, and 
having a single merge node with 2" input edges. As already pointed out, this 
would lead to exponential blowup in both time and space. 

• One way to get rid of imprecision of classical analysis (Sec. O on the pro- 



gram from Fig. |l(a)| would be to consider each path through the loop at a 
time and compute a local invariant for this path. Again, the number of such 
paths could be exponential in the number of tests inside the loop. 

The contribution of our article is a generic method that addresses both of these 
difficulties. 



3.1 Reduced Transition Multigraph and Path Focusing 

Consider a control flow graph {P,E) with associated transitions {Te)e&E, a set of 
widening points Pw £ P such that removing Pw cuts all cycles in the graph, and 
a set Pr of abstraction points, such that P^ Q Pr Q P (On the figures, the nodes 
in Pr are in bold). We make no assumption regarding the choice of P^; there are 
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classical methods for choosing widening points |l9j §3.6]. Pr can be taken equal to 
Pw, or may include other nodes; this makes sense only if these nodes have several 
incoming edges. Including other nodes will tend to reduce precision, but may 
improve scalability. We also make the simplifying assumption that the set of initial 
values Ip is empty for all nodes in P \ Pr — in other words, the set of possible 
control points at program start-up is included in Pr. 

We construct (virtually) the reduced control multigraph (Pr, Er), with edges 
Er consisting of the paths in {P, E) that start and finish on nodes in Pr, with associ- 
ated semantics the composition of the semantics of the original edges Tei^-^e„ = 
Te,, o • • • o . There are only a finite number of such edges, because the origi- 
nal graph is finite and removing Pr cuts all cycles. There may be several edges 
between two given nodes, because there may exist several control paths between 
these nodes in the original program. Equivalently, this multigraph can be obtained 
by starting from the original graph {P, E) and by removing all nodes in P \ as 
follows: each couple of edges e^, from p\ to p, and 62, from p to p2, is replaced by 
a single edge from p\to p2 with semantics r^^ o Tpi- 



Example of Section[T](Cont'd) The reduced control flow graph obtained for our 
running example is 



guard X <99 
X :- x + I 




guard X >99 
x:=0 



Our analysis algorithm performs chaotic iterations over that reduced multi- 
graph, without ever constructing it explicitly. We start from an iteration strategy, 
that is, a method for choosing which of the equations to apply next; one may for 
instance take a variant of the naive "breadth-first" algorithm from ^ but any iter- 
ation strategy (9^. §3.7] befits us (see also Alg.[T|). An iteration strategy maintains a 
set of "active nodes", which initially contains all nodes p such that Ip i= 0. It picks 

a 

one edge e from an active node pi to a node p2, and applies X^, '■- Xp^ UTl{Xp^) in 
the case of anode p2 e Pr\Pw, and applies Xp^ :- Xp, v(Xp,lJTf(Xp|)) if p2 e Pw', 
then p2 is added to the set of active nodes if the value of Xp^ has changed. 

Our alteration to this algorithm is that we only pick edges e from pi to p2 such 
that there exist xi e Xp, , X2 e Te({xi }) and X2 i. Xp-, with the current values of 
Xpj and Xp2. In other words, going back to the original control flow graph, we 
only pick paths that add new reachable states to their end node, and we temporarily 
focus on such a path. 

How do we find such edges e out of potentially exponentially many? We ex- 
press them as the solution of a hounded reachability problem — how can we go 
from control state p\ with variable state in Xp, to control state p2 with variable state 
in Xp2 — , which we solve using satisfiability modulo theory (SMT). (See Alg. |2]l 
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3.2 Finding Focus Paths 



We now make the assumption that both the program transition semantics Tg and 
the abstract elements e D can be expressed within a decidable theory T (this 
assumption may be relaxed by replacing the concrete semantics, including e.g. 
multiplicative arithmetic, by a more abstract one through e.g. linearization |l30l). 

Such is for instance the case if the program operates on rational values, so a 
program state is an element of E = Q", all operations in the program, including 
guards and assignments, are linear arithmetic, and the abstract domain is the do- 
main of convex polyhedra over Q", in which case T can be the theory of linear 
real arithmetic (LRA). If program variables are integer, with program state space 
S = Z", but still retaining the abstract domain of convex polyhedra over Q", then 
we can take T to be the theory of linear integer arithmetic (LIA). Deciding the sat- 
isfiability of quantifier-free formulas in either LIA or LRA, with atoms consisting 
in propositional variables and in linear (in)equalities with integer coefficients, is 
NP-complete. There however exist efficient decision procedures for such formu- 
las, known as SMT-solvers, as well as standardised theories and file formats [HI; 
notable examples of SMT-solvers capable of dealing with LIA and LRA are Z3 
and Yices. Kroening & Strichman |29| give a good introduction to the techniques 
and algorithms in SMT solvers. 

We assume that the program is expressed in SSA form, with each program 
variable being assigned a value at only a single point within the program |18|; 
standard techniques exist for converting to SSA. Figure [T] gives both "normal" and 
SSA-form control-flow graphs for Listing |2l 

We transform the original control flow graph {P, E) in SSA form by disconnect- 
ing the nodes in Pr. each node pr in Pr is split into a "source" node with only 
outbound edges, and a "destination" node pf with only inbound edges. We call the 
resulting graph {P',E'). Figure [2(a)] gives the disconnected SSA form graph for 
Listing |2] where p[ and p2 have been split. 

We consider execution traces starting from a p'l node and ending in a pf node. 
We define them as for doing bounded model checking [2|. To each node p e P' 
we attach a Boolean bp or reachability predicate, expressing that the trace goes 
through program point p. For nodes p' not of the form p^, we have a constraint 
bp' - \] pep^pi, for ep^pi ranging over all incoming edges. To each edge p ^ p' 
we attach a Boolean ^p,p', and a constraint ep^p' - bp A Tp^p'. The conjunction p of 
all these constraints, expresses the transition relation between the p^ and pf nodes 
(with implicit existential quantification). 

If the transitions T(p^p>) are non-deterministic, a little care must be exercised for 
the path obtained from the bp to be unique. For instance, if from program point p\ 
one can move non-deterministically to p2 or through edges e2 and e-^ an incorrect 
way of writing the formula would be {b2 = ^2)^(^3 = es)A(e2 = bi)A{eT, = b\), in 
which case b2 and bs could be simultaneously true. Instead, we introduce special 
"choice" variables c; that model non-deterministic choices (Fig.©. 

In order to find a path from program point pi e Pr, with variable state xi. 
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'X3 = X2+ 



X3 



100 ^ ^ X3 < 10(f\ xi = vfei; 
X4 - 




100 .fe5; @.x-3 < 100 \ 
X4 = 



(p{Xi,X4,X3,X2) 



(a) Disconnected (SSA) CFG 



(ei ^ (xi = 0) A ftp A (f-s = (X3 



(b) With a focus path (solid edges) from X2 = at 
program point 2 to Xj = 1 at the same program point 

X2 + 1) A ^2 C^) A (62 = b'^ A -.Cj) A 



{es = bsAx^ > IOOAX4 = 0)A(£'4 ^ bsAXi < 100)A(ft3 ^ e3)A{bf ^ £>i Vf'4Ve5Vf'2) 

A (xl = ite(ei,xi,ite{e5,X4, ite{e4, X3, X2)))) 



Figure 2: Disconnected version of the SSA control flow graph of Fig. |l(b)[ and the 
corresponding SMT formula. ite(b, ^1,^2) is a SMT construct whose value is "if b 
then the value of ei else the value of €2". To each node corresponds a Boolean 
bx and an optional choice variable c^; to each edge, a Boolean ey. 

to program point p2 e Pr, with variable state X2, we simply conjoin p with the 
formulas xi e Xp^ and X2 ^ Xp^, with xi, X2, xi e and X2 ^ Xp^ expressed 
in terms of the SSA variables!! For instance, if Xp^ and Xp^ are convex polyhedra 
defined by systems of hnear inequalities, one simply writes these inequalities using 
the names of the SSA-variables at program points p\ and p2- 

We apply SMT-solving over that formula. The result is either "unsatisfiable", 
in which case there is no path from pi, with variable values xi, to p2, with variable 
values X2, such that xi € Xp^ and X2 ^ Xp^, or "satisfiable", in which case SMT- 
solving also provides a model of the formula (a satisfying assignment of its free 
variables); from this model we easily obtain such a path, unique by construction 
of p. 

Indeed, a model of this formula yields a trace of execution: those bp predicates 
that are true designate the program points through which the trace goes, and the 
other variables give the values of the program variables. 



Example of Section [J (Cont'd) The SSA form of the control flow graph of Fig- 
1(a) is depicted in Figure |l(b)[ Fig. [2] shows the disconnected version of the 



ure 



SSA Graph (the node p2 is now split), and the formula p expressing the semantics 
is shown beneath it. 

Then, consider the problem of finding a path starting in control point 2 inside 
polyhedron x = and ending at the same control point but outside of that polyhe- 



^The formula defining the set of values represented by an abstract element X has sometimes been 
denoted by 7 1341 . 
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dron. Note that because there are two outgoing transitions from node which are 
chosen non-deterministically, we had to introduce a Boolean choice variable c*. 

The focus path of Fig. |2(b)| was obtained by solving the formula p A b\ = 
false A bj = true Ah'i^ - true A (x2 = 0) A -^{x'^ = 0): we impose that the path 
starts at point pl^ (thus forcing h\ = false A = true) in the polyhedron x = Q (thus 
xi = 0) and ends at point (thus forcing b!^ - true) outside of that polyhedron 
(thus ^{X2 - 0)). 

3.3 Algorithm 

Algorithm[2]consists in the iteration of the path finding method of Sec. l3.2[ coupled 
with forward abstract interpretation along the paths found and, optionally, path 
acceleration. 

3.4 Correctness and Termination 

We shall now prove that this algorithm terminates, and that the resulting Xp define 
an inductive invariant that contains all initial states I p. The proof is a variant of the 
correctness proof of the chaotic iterations. 

The invariant maintained by this algorithm is that all nodes pi e PnXAare such 
that there is no execution trace starting at point /^i in a state xi e Xp^ and ending 
at point p2 in a state X2 i Xp^ . Evidently, if A becomes empty, then this condition 
means that Xp is an inductive invariant. 

Termination is ensured by the classical argument of termination of chaotic iter- 
ations in the presence of widening: they always terminate if all cycles in the control 
flow graph are broken by widening points [13, Th. 4.1.2.0.6, p. 128]. In short, an 
infinite iteration sequence is bound to select at least one node p in an infinite 
amount of times, because Pw breaks all cycles, but due to the properties of widen- 
ing, Xp should be stationary, which contradicts the infinite number of selections. 
Our comment at line [20l of Alg. |2]is important for termination: it ensures that for 
any widening node p, the sequence of values taken by Xp when it is updated and 
reinserted into set A is strictly ascending, which ensures termination in finite time. 

3.5 Self-Loops 

The algorithm in the preceding subsection is merely a "clever" implementation of 
standard polyhedral analysis 1 17, 27] on the reduced control multigraph (Ps,^^); 
the difference with a naive implementation is that we do not have to explicitly 
enumerate an exponential number of paths and instead leave the choice of the focus 
path to the SMT-solver. We shall now describe an improvement in the case of self- 
loops, that is, single paths from one node to itself. 

Algorithm [3] is a variant of Alg. [2] where self-loops are treated specially: 

• The loopiter{T^,X) function returns the result of a widening / narrowing it- 
eration sequence for abstract transformer r* starting in X; it returns X' such 
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Algorithm 2 Path-focused Algorithm 



1: Compute SSA-form of the control flow graph. 

2: Choose Pr, compute the disconnected graph (P', £") accordingly. 

3: /9 <— computeFormula(P', £■') > Precomputations 

4: A ^ 0; 

5: for all p ePR such that do 

6: A^AU{p] 

7: end for; 

8: while A is not empty do > Fixpoint Iteration on the reduced graph 

9: Choose pi €A 
10: A^A\{pi} 
11: repeat 

/ ^ 
12: res <— SmtSolve p A A xi £ Xp, A ^ ^Zj^j A X2 ^ Xy,,) 

V /'2l(pi,P2)e£'' 
13: if re5 is not "unsat" then 

14: Compute e' e £" from res > Extraction of path from the model 



15 
16 
17 
18 
19 
20; 

21 
22: 
23 
24 
25 
26 
27 
28 



if P2 £ Pw then 

^femp ^ (^P2 '-' ^) Final point p2 is a widening point 

else 

end if 

> at this point Xtemp ^ otherwise p2 would not have been chosen 

Xp2 ^ Xfgffip 

A ^ A U {p2] 
end if 
until re5="unsat" 

end while > End of Iteration 

Possibly narrow (see Sec. 14.11) 
Compute Xp. for p, ^ Pr 
return all Xp. 



that X c X' and (X') c X'. 

• In order not to waste the precision gained by loopiter, the first time we con- 
sider a self-loop e' we apply a union operation instead of a widening; set 
U records the self-loops that have already been visited. This is a form of 
delayed widening [|28J . 

Termination is still guaranteed, because the inner loop cannot loop forever: it 
can visit any self-loop edge e' at most once before applying widening. 
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Algorithm 3 Path-focused Algorithm with Self-Loops. marks changes from 

Aig.m 

Compute SSA-form of the control flow graph. 
Choose Pr, compute the disconnected graph (P', E') accordingly. 
p <— computeFormula(P', £") > Precomputations 

A ^0; 

for all p e Pr such that Ip i= d) do 

A^AUip} 
end for; 

while A is not empty do > Fixpoint Iteration on the reduced graph 

Choose pi eA 
A^A\{pi) 

[7 = > ?7 is a set of "already seen" edges 

repeat 



res <— SmtSolve 



V p2\(pi,P2)eE' 

if res is not "unsat" then 
Compute e' e E' from res 
if p\ = P2 then 



Y <- loopiter{ T*^,,Xp^)^ 

else 

Y^rliXp,) 
end if 

if p2 e Pyf and (pi e' e U) then 

«— J^p2 ^ (^P2 '-' ^) Final point p2 is a widening point 

else 

Xp, ^ Xp, u y 

end if 

A ^ AU{;?2} 
end if 
until re5="unsat" 

end while > End of Iteration 

Compute Xp.s for pi ^ Pr 
return all Xp.?> 

Example of Section [l] (Cont'd) Let us perform our algorithm on our example : 

• Step 1 : Is there a path from control point p\ to control point p2 feasible 
(without additional constraint) ? Yes. On Figure |2l the obtained model cor- 
responds to the transition from * to p'!^, and leads to the interval Xp^ - [0, 0]. 

• Step 2 : Is there a path from p2 with x - Qto p2 with x t 01 The answer 
to this query is depicted in Figure |2(b)[ there is such a path, on which we 
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now focus. This path is considered as a loop and we therefore do a local 
iteration with widenings (loopiter). Xp^ becomes [0, 1], then after widening 
[0, oo]. A narrowing step gives finally Xp, = [0, 99], which is thus the result 
of loopiter. 

• Step 3 : Is there a path from p2 with x e [0, 99] to p2 with x' ^ [0, 99] ? No. 
The iteration thus ends with the desired invariant. 

4 Extensions 
4.1 Narrowing 

Narrowing iterations can also be applied within our framework. Let us assume that 
some inductive invariant X^gp^ has been computed; it satisfies the relation i/'(X) c 
X component- wise, noting X = (Xi, . . . ,X\p\), and (/'(X) denotes (^i, . . . , Y\p\) de- 
fined as 



The abstract counterpart to this operator is defined similarly, replacing t by 

and U by u. It satisfies the correctness condition (see Rel.[T|l VX e D (//(X) c il/'^{X). 
As per the usual narrowing iterations, we compute a narrowing sequence X^^^ = 
(X). It is often sufficient to stop at ^ = 1; otherwise one may stop when X^*^^^^ ^ 

X®. Let us now see a practical algorithm for computing Y = (A*'(X): 

For all p € Pr, we initialise Yp := Ip. For all p2 € Pr, we consider all paths 

e e Er from pi e Pr to p2 such that there exist xi e Xp^, X2 e Xp^, X2 £ Tg({xi)) as 

explained in 93.21 We then update Yp^ :- Yp^U Tg(Xp,). 

4.2 Acceleration 

In Sec. l3.5[ we have described loopiter function that performs a classical widening 
/ narrowing iteration over a single path. In fact, the only requirement over it is that 
loopiter {t'^,X) returns X' such that X c X' and r'^iX') c X'. In other words, X' is 
an over-approximation of r*' (X), noting R* the transitive closure of R. 

In some cases, we can compute directly such an over-approximation, some- 
times even obtaining t*' (X) exactly; this is known as acceleration of the loop. 
Examples of possible accelerations include the case where is given by a differ- 
ence bound matrix 112], an octagon ifTOll . ultimately periodic integer relations ifTTl 
or certain afRne linear relations [23ll22lin. 

For instance, the focus path of Fig. |2(b)| consists in the operations and guards 
X = X + I; X < 100; instead of iterating that path, we can compute its exact acceler- 
ation, yielding x e [0,99]. 




(4) 
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4.3 Partitioning 

It is possible to partition the states at a given program point according to some 
predicate or a partial history of the computation 136 1. This amounts to introducing 
several graph nodes representing the same program point, and altering the transi- 
tion relation. 

4.4 Input- Output Relations 

As with other analyses using relational domains, it is possible to obtain abstrac- 
tions of the input-output relation of a program block or procedure instead of an 
abstraction of the set of states at the current point |T|; this also allows analyzing 
recursive procedures [27, Sec. 7.2]. It suffices to include in the set of variables 
copies of the variables at the beginning of the block or procedure; then the abstract 
value obtained at the end of the block or procedure is the desired abstraction. 

5 Implementation and Preliminary Results 

Our algorithm has been implemented as an option for Aspic, that computes invari- 
ants from counter automata with Linear Relation Analysis ([20]). We wrote an 
Ocaml interface to the Yices SMT-solver ( ifTOl ). and modified the fixpoint compu- 
tation inside Aspic to deal with local iterations of paths. The implementation still 
needs some improvements, but the preliminary results are promising, and we de- 
scribe some of them in Table [T] We provide no timing results since we were unable 
to detect any overcost due to the method. These two examples show that since we 
avoid (some) convex hulls, the precision of the whole analysis is improved. 

The rate limiter example is particularly interesting, since, like the one in List- 
ing [U (which does not include a loop), it will be imprecisely analyzed by any 
method enforcing convex invariants at intermediate steps. 

6 Related Work 

Our algorithm may be understood as a form of chaotic iterations |[T3l §2.9.1, p. 53] 
over a certain system of semantic questions; we use SMT as an oracle to know 
which equations need propagating. The choice of widening points, and the order 
in which to solve the abstract equations, have an impact on the precision of the 
whole analysis, as well as its running time. Even though there exist few hard 
general results as to which strategy is best |[T3l §4.1.2, p. 125], some methods tend 
to experimentally behave better [9 |. 

"Lookahead widening" [24] was our main source of inspiration: iterations and 
widenings are adapted according to the discovery of new feasible paths in the pro- 
gram. This approach avoids loss of precision due to widening in programs with 
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Table 1 : Invariant generation on two simple challenging programs 



Program 



Automaton 



Result and notes 
Ihe compilation of the 
program gives an ex- 
panded control structure 
where some paths are 
"clearly" unfeasible 
(e.g. imposing both 
X < and X > 1000), 
thus the only feasi- 
ble ones are guarded 
by X < 0, X = 0, 
< X < 1000, X = 1000 
and X > 1000. 
The tool finds 
the invariant 
{0 < X < 1000, -1 < d < 
Classical Analysis with 
widening "upto" gives 
{d < l,d+ 1999 > 2x) 
and Gopan and Reps' 
improvement is not able 
to find X > 0. 



Listing 4: Boustrophedon 

void boustrophedon() ) 
int x; 
int d; 
X = 0; 
d = 1; 
while (1) { 

if (x == 0) d = l; 

if (x == 1000) d=- 

X += d; 




Listing 5: Rate limiter 

void main() { 
float x_old , X ; 
x_old = 0; 
while (1) { 

X = input(-100040 
if (x >= x_old + l) 

X = x_old+l; 
if (x <= x_old-l) 

X = x_old -1; 
x_old = x; 



Source : 132 




In order to properly 
analyse such a program, 
AsTREE distinguishes 
all four execution paths 
inside the loop through 
trace partitioning Ii36]| . 
which is triggered by 
ad hoc syntactic criteria 
(e.g. two successive 
if-then-else). Our algo- 
rithm finds the invariant 
{-1000 < Xoid < 1000), 
which is not found by 
classical analysis. 



15 



multiple paths inside loops. It has proved its efficacy to suppress some gross over- 
approximations induced by naive widening. However, it does not solve the impre- 
cisions introduced by convex hull (e.g. it produces false alarms on Listing [T]). 

Our method analyzes separately the paths between cut-nodes. We have pointed 
out that this is (almost) equivalent to considering finite unions of elements of 
the abstract domain, known as the finite powerset construction, between the cut- 
nodes0 The finite powerset construction is however costly even for loop-free 
code, and it is not so easy to come up with widening operators to apply it to 
codes with loops or recursive functions lU ; for limiting the number of elements 
in the unions, some may be lumped together (thus generally introducing further 
over-approximation) according to affinity heuristics |[37l[33l . 

Still, in the recent years, much effort has been put into ffie discovery of dis- 
junctive invariants, for instance in predicate abstraction Il25l . Of particular note is 
the recent work by Gulwani and Zuleger on inferring disjunctive invariants |26 | for 
finding bounds on the number of iterations of loops. We improve on their method 
on two points: 

• In contrast to us, they assume that the transition relation is given in disjunc- 
tive normal form Il26l Def. 5], which in general has exponential size in the 
number of tests inside the loop. By using SMT-solving, we keep ffie DNF 
implicit and thus avoid this blowup. 

• By using acceleration, we may obtain more precise results than using widen- 
ing, as they do for lattices that do not satisfy the ascending chain condition. 

Nevertheless, their method allows expressing disjunctive invariants at loop 
heads, and not only at intermediate points, as we do. However, we think it is 
possible to get the best of both worlds and combine our method with theirs. In or- 
der to obtain a disjunctive invariant, they first choose a "convexity witness" (given 
ffiat the number of possible witnesses is exponential, they choose it using heuris- 
tics) Il26l p. 7], and then they compute a "transitive closure" 1261 Fig. 6], which is 
a form of fixed point iteration of input-output relations (as in our Sec. 14.41) over an 
expanded control-flow graph. The choice of the convexity witness amounts to a 
partitioning of the nodes and transition (Sec. l4.3l) . Thus, it seems to possible to ap- 
ply their technique, but replace their fixed point iteration 1 26 , Fig. 6] by one based 
on SMT-solving and path focusing, using acceleration if possible. 

In recent years, because of improvement in SMT-solving, techniques such as 
ours, distinguishing paths inside loops, have become tractable |[3n r71 [32ll2n . An 
alternative to using SMT-solving is to limit the number and length of traces to 
consider, as in trace partitioning Il36ll . used in the Astree analyzer |[T6l[T5l [8l. but 

''It is equivalent if the only source of disjunctions are the splits in the control flow, and not atomic 
operations. For instance, if the test |x| > 1 is considered an atomic operation, then we could take the 
disjunction x > 1 V x < -1 as output. We can rephrase that as a control flow problem by adding 
a test X > 0, otherwise said to express |x| as a piecewise linear function with explicit tests for splits 
between the pieces. 
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the criteria for limitation tend to be ad hoc. In addition, methods for abstracting the 
sets of paths inside a loop, weeding out infeasible paths, have been introduced [3. 

With respect to optimality of the results, our method will generate the strongest 
inductive invariant inside the abstract domain if the domain satisfies the ascending 
chain condition and no widening is used; for other domains, like all methods using 
widenings, it may or may not generate it. In contrast, some recent works lf2T1l 
guarantee to obtain the strongest invariant for the same analysis problem, at the 
expense of restriction to template linear domains and linear constructions inside 
the code. 

7 Conclusion and future work 

We have described a technique which leverages the bounded model checking ca- 
pacities of current SMT solvers for guiding the iterations of an abstract inter- 
preter. Instead of normal iterations, which "push" abstract values along control- 
flow edges, including control-flow splits and merges, we consider individual paths. 
This enables us, for instance, to use acceleration techniques that are not available 
when the program fragment being considered contains control-flow merges. This 
technique computes exact least invariants on some examples on which more con- 
ventional static analyzers incur gross imprecision or have to resort to syntactic 
heuristics in order to conserve precision. 

We have focused on numerical abstractions. Yet, one would like to use similar 
techniques for heap abstractions, for instance. The challenge will then be to use 
a decidable logic and an abstract domain such that both the semantics of the pro- 
gram statements and the abstract values can be expressed in this logic. This is one 
direction to explore. With respect to the partitioning technique, 14.31 we currently 
express the partition as multiple explicit control nodes, but it seems desirable, for 
large partitions (e.g. according to Boolean values, as in B. Jeannet's BDD-Apron 
Ubrary) to represent them succinctly; this seems to fit nicely with our succinct en- 
coding of the transition relation as a SMT-formula. 

Another direction is to evaluate the scalability of these methods on larger pro- 
grams. The implementation needs to be tested more to evaluate the precision of 
our method on middle-sized programs, the main advantage is that Aspic imple- 
ments some of the acceleration techniques. Analyzers such as Astree scale up to 
programs running a control loop several hundreds of thousands of lines long; trans- 
lating such a loop to a SMT formula and solving for this formula and additional 
constraints does not seem tractable. It is possible that semantic slicing techniques 
051 could help in reducing the size of the generated SMT problems. 
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